Human action recognition based on multi-scale feature maps from depth video sequences

نویسندگان

چکیده

Human action recognition is an active research area in computer vision. Although great progress has been made, previous methods mostly recognize actions from depth video sequences at only one scale, and thus they often neglect multi-scale spatial changes that provide additional information practical applications. In this paper, we present a novel framework with mechanism to improve scale diversity of motion features. We propose feature map called Laplacian pyramid images(LP-DMI). First, employ images (DMI) as the templates generate static representation actions. Then, caculate LP-DMI enhance dynamic motions reduce redundant human bodies. further extract multi-granularity descriptor LP-DMI-HOG more discriminative Finally, utilize extreme learning machine (ELM) for classification. The proposed method yeilds accuracy 93.41%, 85.12%, 91.94% on public MSRAction3D, UTD-MHAD DHA dataset. Through extensive experiments, prove our outperforms state-of-the-art benchmarks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Scale Locality-Constrained Spatiotemporal Coding for Local Feature Based Human Action Recognition

We propose a Multiscale Locality-Constrained Spatiotemporal Coding (MLSC) method to improve the traditional bag of features (BoF) algorithm which ignores the spatiotemporal relationship of local features for human action recognition in video. To model this spatiotemporal relationship, MLSC involves the spatiotemporal position of local feature into feature coding processing. It projects local fe...

متن کامل

Multi-Features Encoding and Selecting Based on Genetic Algorithm for Human Action Recognition from Video

In this study, we proposed multiple local features encoded for recognizing the human actions. The multiple local features were obtained from the simple feature description of human actions in video. The simple features are two kinds of important features, optical flow and edge, to represent the human perception for the video behavior. As the video information descriptors, optical flow and edge,...

متن کامل

Video Matting from Depth Maps

Image matting is the process of computing an alpha value for each pixel that corresponds to the amount of the pixel that is foreground and the amount that is background. This is often used to isolate the foreground and replace the background with a different image. Typically this is done using a special studio and a blue (or green) screen for easier segmentation. However, this method is not as ...

متن کامل

Video Matting from Depth Maps

Image matting is the process of taking an image, isolating the foreground, and replacing the background with a new image. This can be a hard problem when the background is unknown as it is fundamentally unconstrained. We look at an existing technique for foreground-background separation called Bayesian matting and improve upon it by adding depth information acquired by a time-of-flight range sc...

متن کامل

Action Recognition using Temporal Bag-of-Words from Depth Maps

In this paper, we present a methodology for human action recognition from a sequence of depth maps obtained using Microsoft Kinect. Specifically, we use a Temporal Bag-of-Words model as representation scheme to capture the variation of features across the temporal domain. Our methodology builds the Temporal Bag-of-Words model on top of the spatiotemporal features extracted from interest points....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Multimedia Tools and Applications

سال: 2021

ISSN: ['1380-7501', '1573-7721']

DOI: https://doi.org/10.1007/s11042-021-11193-4